Skip to content

Conversation

wking
Copy link
Member

@wking wking commented Aug 25, 2025

The ConfigMap retrieval occasionally fails:

$ curl -s 'https://search.dptools.openshift.org/search?maxAge=24h&type=junit&context=0&search=error%20accessing%20microshift-version%20configmap' | jq -r 'to_entries[] | .key as $k | .value | to_entries[].value[].context[] | $k + "\n  " + .'
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-techpreview/1959742094134218752
    I0824 23:53:47.495715 90800 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: read udp 10.129.136.246:33085->172.30.0.10:53: i/o timeout
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-cgroupsv2/1959742058285502464
    I0824 23:53:34.234531 68956 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX on 172.30.0.10:53: read udp 10.130.176.97:35168->172.30.0.10:53: i/o timeout
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.14-to-4.15-to-4.16-to-4.17-ci/1959761685916946432
    I0825 04:30:25.015456 907 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30067/pull-ci-openshift-origin-main-e2e-hypershift-conformance/1959846453614481408
    I0825 07:00:44.838632 72570 framework.go:2320] error accessing microshift-version configmap: Get "https://add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com on 172.30.0.10:53: read udp 172.24.238.50:43260->172.30.0.10:53: i/o timeout

Placing it within a poll loop will avoid failing test-cases while they try to decide if the cluster is MicroShift or not, which is likely to be part of setup (e.g. "skip this test-case if the cluster is MicroShift"), and not the fundamental thing the test-case is trying to exercise.

Ideally there would be no hard-coded duration, and IsMicroShiftCluster would take a Context argument set up with whatever the caller felt was a reasonable time to make that determination. But I didn't want to get into adjusting all the IsMicroShiftCluster callers, so for now, I'm just hard-coding the duration.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 25, 2025
@openshift-ci-robot
Copy link

@wking: This pull request explicitly references no jira issue.

In response to this:

The ConfigMap retrieval occasionally fails:

$ curl -s 'https://search.dptools.openshift.org/search?maxAge=24h&type=junit&context=0&search=error%20accessing%20microshift-version%20configmap' | jq -r 'to_entries[] | .key as $k | .value | to_entries[].value[].context[] | $k + "\n  " + .'
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-techpreview/1959742094134218752
   I0824 23:53:47.495715 90800 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: read udp 10.129.136.246:33085->172.30.0.10:53: i/o timeout
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-cgroupsv2/1959742058285502464
   I0824 23:53:34.234531 68956 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX on 172.30.0.10:53: read udp 10.130.176.97:35168->172.30.0.10:53: i/o timeout
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.14-to-4.15-to-4.16-to-4.17-ci/1959761685916946432
   I0825 04:30:25.015456 907 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30067/pull-ci-openshift-origin-main-e2e-hypershift-conformance/1959846453614481408
   I0825 07:00:44.838632 72570 framework.go:2320] error accessing microshift-version configmap: Get "https://add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com on 172.30.0.10:53: read udp 172.24.238.50:43260->172.30.0.10:53: i/o timeout

Placing it within a poll loop will avoid failing test-cases while they try to decide if the cluster is MicroShift or not, which is likely to be part of setup (e.g. "skip this test-case if the cluster is MicroShift"), and not the fundamental thing the test-case is trying to exercise.

Ideally there would be no hard-coded duration, and IsMicroShiftCluster would take a Context argument set up with whatever the caller felt was a reasonable time to make that determination. But I didn't want to get into adjusting all the IsMicroShiftCluster callers, so for now, I'm just hard-coding the duration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and sjenning August 25, 2025 12:16
@stbenjam
Copy link
Member

/lgtm

Copy link
Member

@sosiouxme sosiouxme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD cc8ae10 and 2 for PR HEAD d63d774 in total

@wking
Copy link
Member Author

wking commented Sep 8, 2025

/verified by examining all serial presubmits, no failures identified for these tests
/retest-required

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 8, 2025
@openshift-ci-robot
Copy link

@wking: This PR has been marked as verified by examining all serial presubmits,no failures identified for these tests.

In response to this:

/verified by examining all serial presubmits, no failures identified for these tests
/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bc0ce7e and 2 for PR HEAD d63d774 in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD bc0ce7e and 2 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 61d6229 and 1 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 61d6229 and 2 for PR HEAD d63d774 in total

2 similar comments
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 61d6229 and 2 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 61d6229 and 2 for PR HEAD d63d774 in total

@wking
Copy link
Member Author

wking commented Sep 9, 2025

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 841b786 and 1 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 54734aa and 2 for PR HEAD d63d774 in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 54734aa and 2 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c21df2e and 1 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD c21df2e and 2 for PR HEAD d63d774 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 130265e and 1 for PR HEAD d63d774 in total

@wking wking force-pushed the harden-IsMicroShiftCluster branch from d63d774 to 18c8314 Compare September 10, 2025 20:55
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Sep 10, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 10, 2025
Copy link

openshift-trt bot commented Sep 11, 2025

Job Failure Risk Analysis for sha: 18c8314

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (106) are below the historical average (286): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn High
[sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel]
This test has passed 99.90% of 1948 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2 High
[sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel]
This test has passed 99.90% of 1948 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel]
This test has passed 99.90% of 1948 runs on release 4.21 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-proxy High
[sig-cli] oc explain should contain proper spec+status for CRDs [Suite:openshift/conformance/parallel]
This test has passed 99.90% of 1948 runs on release 4.21 [Overall] in the last week.

@wking wking force-pushed the harden-IsMicroShiftCluster branch from 18c8314 to 3170441 Compare September 11, 2025 03:48
The ConfigMap retrieval occasionally fails:

  $ curl -s 'https://search.dptools.openshift.org/search?maxAge=24h&type=junit&context=0&search=error%20accessing%20microshift-version%20configmap' | jq -r 'to_entries[] | .key as $k | .value | to_entries[].value[].context[] | $k + "\n  " + .'
  https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-techpreview/1959742094134218752
      I0824 23:53:47.495715 90800 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-hfy5sixp-82aa7.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: read udp 10.129.136.246:33085->172.30.0.10:53: i/o timeout
  https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-cgroupsv2/1959742058285502464
      I0824 23:53:34.234531 68956 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-950shpk8-29422.XXXXXXXXXXXXXXXXXXXXXX on 172.30.0.10:53: read udp 10.130.176.97:35168->172.30.0.10:53: i/o timeout
  https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.14-to-4.15-to-4.16-to-4.17-ci/1959761685916946432
      I0825 04:30:25.015456 907 framework.go:2294] error accessing microshift-version configmap: Get "https://api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup api.ci-op-wwl4bdif-d25a9.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
  https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30067/pull-ci-openshift-origin-main-e2e-hypershift-conformance/1959846453614481408
      I0825 07:00:44.838632 72570 framework.go:2320] error accessing microshift-version configmap: Get "https://add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": dial tcp: lookup add640066b4a046d58cb9cc48def1e09-925574a45e949171.elb.us-east-1.amazonaws.com on 172.30.0.10:53: read udp 172.24.238.50:43260->172.30.0.10:53: i/o timeout

Placing it within a poll loop will avoid failing test-cases while they
try to decide if the cluster is MicroShift or not, which is likely to
be part of setup (e.g. "skip this test-case if the cluster is
MicroShift"), and not the fundamental thing the test-case is trying to
exercise.

Ideally there would be no hard-coded duration, and IsMicroShiftCluster
would take a Context argument set up with whatever the caller felt was
a reasonable time to make that determination.  But I didn't want to
get into adjusting all the IsMicroShiftCluster callers, so for now,
I'm just hard-coding the duration.

In the kapierrs.IsNotFound(err) branch, I set 'cm' to nil.  I do not
understand why Get returns a zero-value &ConfigMap in the not-found
case, but it does.  When I inserted logging in this branch, the output
was [1]:

  I0910 23:13:46.463728 119748 framework.go:2322] microshift-version not found, but ConfigMap is non nil?
  configmaps "microshift-version" not found
  &ConfigMap{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Data:map[string]string{},BinaryData:map[string][]byte{},Immutable:nil,}

With the logic setting 'cm' to nil in that case, it keeps the later
'microshift-version configmap not found' return working.

[1]: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30161/pull-ci-openshift-origin-main-e2e-aws-ovn-fips/1965881831995740160
@wking
Copy link
Member Author

wking commented Sep 11, 2025

/verified by examining all failing serial and MicroShift presubmits, no failures identified for these tests
/retest-required

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 11, 2025
@openshift-ci-robot
Copy link

@wking: This PR has been marked as verified by examining all failing serial and MicroShift presubmits,no failures identified for these tests.

In response to this:

/verified by examining all failing serial and MicroShift presubmits, no failures identified for these tests
/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hongkailiu
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 11, 2025
Copy link
Contributor

openshift-ci bot commented Sep 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, sosiouxme, stbenjam, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-trt bot commented Sep 11, 2025

Job Failure Risk Analysis for sha: 3170441

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (31) are below the historical average (273): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 97f9052 and 2 for PR HEAD 3170441 in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 97f9052 and 2 for PR HEAD 3170441 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 821a738 into openshift:main Sep 12, 2025
34 of 47 checks passed
Copy link
Contributor

openshift-ci bot commented Sep 12, 2025

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-single-node-serial 3170441 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2 3170441 link false /test e2e-metal-ipi-serial-ovn-ipv6-2of2
ci/prow/e2e-openstack-ovn 3170441 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-disruptive 3170441 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-ovn-techpreview 3170441 link false /test e2e-gcp-ovn-techpreview
ci/prow/e2e-agnostic-ovn-cmd 3170441 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-aws-ovn-single-node 3170441 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-metal-ipi-virtualmedia 3170441 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-hypershift-conformance 3170441 link false /test e2e-hypershift-conformance
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 3170441 link false /test e2e-gcp-ovn-techpreview-serial-2of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking wking deleted the harden-IsMicroShiftCluster branch September 12, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants